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(54) Digital signal processor architecture optimized for performing fast fourier transforms 



(57) A digital signal processor architecture particu- 
larly adapted for performing fast Fourier Transform al- 
gorithms efficiently. The architecture comprises dual, 
parallel multiply and accumulate units in which the out- 



put of the multiplier circuit portion of each MAC is cross- 
coupled to an input of the adder unit of the other MAC 
as well as to an input of the adder unit of the same MAC 
to which the multiplier belongs. 
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Description 

Field of the Invention 

The invention pertains to architectures for digital 5 
signal processors. More particularly, the invention per- 
tains to processor architectures for performing fast Fou- 
rier Transforms. 

Background of the Invention 

The Fourier Transform is a well-known mathemati- 
cal operation for converting a signal from the time do- 
main to the frequency domain. A Fourier Transform op- 
erates on a signal which is varying in time to derive the 
frequency components in the signal and their magni- 
tudes. In the digital domain, the discrete Fourier Trans- 
form (DFT) is used to convert from the time domain to 
the frequency domain. 

Fourier Transforms, and particularly discrete Fouri- 
er Transforms have many applications. One common 
application is in digital speech processing. For example, 
the wireless communications field, and particularly the 
cellular telephone communication field, has seen an ex- 
ponential growth in data traffic over the past several 
years. However, the bandwidth availability for wireless 
communications is extremely limited. Accordingly, much 
effort has been dedicated to encoding speech data into 
a highly compressed form for transmission. For in- 
stance, a person can speak into a digital cellular tele- 
phone containing circuitry and/or software to digitize the 
speech, convert or compress it into a highly compressed 
digital format and transmit the compressed digital data. 
The receiving device contains circuitry and/or software 
for decoding the compressed digital data back into the 
original digital signal, converting it back to analog form 
and providing it to a listener. Digital encoding schemes 
for highly compressing video signals also are in wide 
use today. MPEG and JPEG being two of the more com- 
monly known compression standards. 

Some researchers are working on developing fre- 
quency domain compression algorithms for speech, vid- 
eo and other data. As such, an analog signal is first dig- 
itized and then converted into the frequency domain be- 
fore it can be compressed. Accordingly, there is a need 
for a method and apparatus for performing Fourier 
Transforms as quickly as possible. Particularly, in order 
for frequency domain compression algorithms to be 
practical in the cellular telephone environment for 
speech signals, a Fourier Transform must be able to be 
performed essentially in real time. 

One particularly fast way to perform discrete Fourier 
Transforms is known as the Fast Fourier Transform 
(FFT) method. Although there are many different algo- 
rithms for performing FFT, they all share a basic canon- 
ical unit operation that is repeated many times with dif- 
ferent variables, but all sharing the same basic set of 
mathematical operations. The FFT algorithms can be 



performed in a programmable environment or by dedi- 
cated hardware. By programmable environment, we 
mean that the operation is performed primarily by soft- 
ware running on a general purpose machine, such as a 
FFT software algorithm running on a standard personal 
computer (PC). To date, purpose built dedicated hard- 
ware circuits for performing fast Fourier Transforms are 
under development that can approach the speeds need- 
ed for real-time applications. However, purpose built 
hardware is expensive and generally cannot be used for 
other purposes, but only for performing FFTs. Program- 
mable environment solutions of fast Fourier Transform 
algorithms generally are less expensive than dedicated 
hardware, but usually are slower. 

Accordingly, it is an object of the present invention 
to provide an improved hardware design for performing 
fast Fourier Transforms. 

It is another object of the present invention to pro- 
vide an improved digital processor apparatus for per- 
forming fast Fourier Transforms. 

It is a further object of the present invention to pro- 
vide an improved general purpose digital processor hav- 
ing an architecture that can perform fast Fourier Trans- 
forms very quickly. 

It is yet another object of the present invention to 
provide an improved general purpose digital processor 
having an architecture that can perform fast Fourier 
Transforms very quickly in a programmable environ- 
ment. 

Summary of the Invention 

The invention is a general purpose digital processor 
architecture that is particularly adapted for performing 
fast Fourier Transforms extremely efficiently. According- 
ly, using a processing device employing the architecture 
of the present invention, one can perform fast Fourier 
Transforms in a programmable environment extremely 
efficiently. 

Particularly, the architecture of the present inven- 
tion utilizes two parallel multiply and accumulate (MAC) 
units with a crossover coupling between the two MAC 
units. 

The canonical unit of the FFT algorithm is the "but- 
terfly" operation, in which the sum and difference of two 
complex products are generated. In the architecture of 
the present invention, the two parallel MACs simultane- 
ously perform the two multiplication operations at the 
core of the butterfly operation. The multiply circuit of 
each MAC is followed by an adder circuit. The outputs 
of the two adders are forwarded to a common accumu- 
lator register file. The output of each multiply unit is cou- 
pled to one input terminal of the corresponding adder as 
well as to one input terminal of the adder of the other 
MAC unit. A third input terminal to each of the add cir- 
cuits is coupled to the output of the common accumula- 
tor. 

In this manner, one half of the entire canonical but- 
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terfly unit of the FFT calculation (i.e., the real or imagi- 
nary portion of the complex calculation) can be per- 
formed in a single instruction cycle. 

Brief Description of the Drawings 

Figure 1 is a graphical representation of a sinusoi- 
dal signal in the time domain. 

Figure 2 is a graphical representation of the sinu- 
soidal signal shown in Figure 1 transformed into the fre- 
quency domain. 

Figure 3 is a graphical representation of an exem- 
plary amplitude modulated signal in the time domain. 

Figure 4 is a graphical representation of the ampli- 
tude modulated signal of Figure 3 transformed into the 
frequency domain. 

Figure 5 is a graphical representation of the canon- 
ical butterfly unit of fast Fourier Transform algorithms. 

Figure 6 is a graphical representation of an exem- 
plary complete fast Fourier Transform algorithm. 

Figure 7 is a block diagram of a digital signal proc- 
essor architecture in accordance with the present inven- 
tion. 

Detailed Description of a Preferred Embodiment of the 
Invention 

Figure 1 is a plot showing magnitude of an exem- 
plary electromagnetic signal plotted against time. The 
signal is a sinusoidal signal of fixed frequency co^. As 
noted above, the signal can be converted into the fre- 
quency domain by means of a Fourier Transform to de- 
termine the frequency components within the analog 
signal. This analog signal also can be digitized by sam- 
pling the signal at discrete instances in time with a fixed 
period between the samples. In order to avoid aliasing, 
the Nyquist condition must be met, i.e., the sampling 
rate must be at least twice the maximum frequency com- 
ponent of the signal. 

In the digital domain, the digital samples of the sig- 
nal can be put through a discrete Fourier Transform 
(DFT) mathematical algorithm to determine the frequen- 
cy components of the signal in a discrete manner. In par- 
ticular, in DFT analysis, the signal can be partitioned into 
contiguous segments of any desired duration, each seg- 
ment comprising a plurality of samples. In the example 
shown in Figure 1 , the signal is sampled at a period that 
provides 16 sample points per segment. Utilizing the 
DFT algorithm, each segment of the signal can be con- 
verted into the frequency domain. The number of dis- 
crete frequency components which can be distin- 
guished in the frequency domain is equal to the number 
of samples in the segment. Accordingly, the discrete 
Fourier Transform of each segment of the signal is dis- 
tinguishable into sixteen different evenly spaced fre- 
quency components in the overall frequency band. The 
overall frequency band is dictated by the sampling fre- 
quency and the bandwidth of the signal and, particularly, 



spans from 0 hertz to one-half the sampling frequency 
as normalized with respect to the bandwidth of the sig- 
nal. 

Figure 2 is a graphical representation of the time 
5 domain signal of Figure 1 transformed into the frequen- 
cy domain by DFT analysis. Figure 2 shows a plot of the 
magnitude of the signal versus frequency. Of course, 
since the time domain signal is a sine wave of fixed fre- 
quency, the frequency domain plot has only one fre- 
quency component, namely, a component at frequency 
<*>o 

As another example, Figure 3 illustrates a slightly 
more complex time domain signal. This is an amplitude 
modulated signal with an information content signal at 
frequency o> m riding on a carrier frequency of ojq. When 
converted into the frequency domain, this signal has fre- 
quency components at 
co 0 -<o mi o\,. and a> 0 +a> m . 



The digital Fourier Transform is expressed as 
where: 

N is the number of samples in the designated time 
segment (and thus also the number of discrete fre- 
quency components in the frequency domain sig- 
nal); 

n is the particular index in the time domain sample, 
from n=0 to n=N-1 ; 

x(n) is the magnitude of the time domain signal at 
time sample point corresponding to n; 

k is the particular frequency domain component, 
from k=0 to k=N-1 ; and 

X(k) is the magnitude of the frequency component 
at the frequency corresponding to k. 

As can be seen from the equation above, the com- 
putational load for performing the DFT algorithm is pro- 
portional to 6N 2 . Particularly, each of x(n) and ei 2 * 1 ^ 
are complex numbers. Accordingly, each multiplication 
operation involves 4 multiplications and 2 adds, for a to- 
tal of six operations. For each x(k), the 4 multiplications 
and 2 adds are performed N times. Further, x(k) must 
be calculated for k = 1 to N. Accordingly, computational 
load is proportional to 6N 2 

Fast Fourier Transform algorithms reduce the com- 
putational load from being proportional to 6N 2 to Nlog 2 
N. While there are various algorithms for performing 
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FFT, all of them share a basic canonical operation 
known as the FFT butterfly operation. FFT algorithms 
require that N=2 R , where R is a positive integer. 

Figure 5 illustrates the canonical FFT butterfly op- 
eration, while equations 2 and 3 below illustrate the op- 
eration in mathematical notation. 

X(m+1 )=X(m)+W(N , k)Y(m) (Eq. 2) 



Y(m+1 )=X(m)-W(N,k)Y(m) (Eq. 3) 

X and Y are input signals, as discussed in more de- 
tail below. W is a complex variable given by W=e- i2*k/N. 
As can be seen from Figure 3, the term butterfly comes 
from the fact that the canonical unit involves two equa- 
tions, each involving an operation between the two input 
signals, X and Y, and a third variable, W(N,k). Specifi- 
cally, a first result is obtained by adding the product of 
one of the input signals Y and the variable W(N,k) to the 
other input signal X, while the second result is obtained 
by subtracting the same product from the same input 
signal (i.e., the first input signal). 

Since X(m), Y(m) and W(N,k) are each a complex 
number, let us assume that; 

X(m)=b 1+ j Cl (Eq.4) 



Y(m)=b 2+ jc 2 (Eq. 5) 



W(N,k)=b 0+ jc 0 . (Eq. 6) 

Then, 

X (m+1 ) = b 1 +b 0 b 2 -c 0 c 2 +j (c 1 +b 0 c 2 +c 0 b 2 ) (Eq. 7) 



Y (m+1 ) = b 1 -b 0 b 2 +c 0 c 2 +j (c 1 -b 0 c 2 -c 0 b 2 ) (Eq. 8) 

As can be seen, calculating the real parts of X(m+1 ) 
and Y(m+1) calls for 4 multiplications and 4 additions 
each. However, the four multiplications in the equation 
for X(m+1) are the same multiplications as in the equa- 
tion for Y(m+1), namely, b 0 b 2 , c 0 c 2 , b 0 c 2 , and c 0 b 2 ). 

In all FFT algorithms, the canonical FFT butterfly 
operation is executed many times with different varia- 
bles, X, Y and W, to arrive at the fast Fourier Transform 
of a time domain signal segment. Figure 4 helps illus- 
trate a small scale complete FFT mathematical opera- 
tion. As shown, the inputs on the left-hand side of Figure 
4 are the time domain samples X(0) through X(15) from 
Figure 1 , and correspond to input variables X and Y of 



equations 2 and 3 above, as explained more fully below. 
With sixteen samples, the FFT operation goes through 
4 stages, m=1 through m=4. As can be seen from equa- 
tions 2 and 3, the butterfly operation is performed on 

5 pairs of inputs. Since there are 16 samples, in each 
stage m=1 through m=4, the butterfly operation is per- 
formed 8 times. In stage m=l, for instance, the 8 pairs 
of inputs are (1) X(0) and X(1), (2) X (2) and X (3), (3) 
X(4) and X(5), (4) X(6) and X(7), (5) X(8) and X(9), (6) 

10 X( 1 0) and X(1 1 ), (7) X(1 2) and X(1 3), and (8) X( 1 4) and 
X(15). Thus for example, referring to Figure 5, in the very 
first butterfly operation, the time domain sample X(0) 
corresponds to X(m) in equations 2 and 3, while the time 
domain input X(1) in Figure 5 corresponds to Y(m) in 

is equations 2 and 3. In the m=1 stage, the inputs, e.g., X 
(0) and X(1 ) are the actual time domain samples and, 
therefore, are non-complex (i.e., contain only a real 
part). The variable W(N,k), however, is complex. Ac- 
cordingly, despite the fact that the original inputs in stage 

20 m=1 are real only, the operation is, nevertheless, com- 
plex. Further, for all subsequent stages, m=2 to m=4, 
typically all numbers will be complex. 

In the second stage, m=2, the output of the first 
stage, corresponding to the X(0) row, is mixed with the 

25 output of the third row, corresponding to X(2), in the but- 
terfly operation. Likewise, the X(1 ) row is mixed with the 
X(3) row, the X(4) row is mixed with the X(6) row, the X 
(5) row is mixed with the X(7) row, the X(8) row is mixed 
with the X(10) row, the X(9) row is mixed with the X(11 ) 

30 row, the X(1 2) row is mixed with the X(1 4) row and the 
X(13) row is mixed with the X(15) row. 

In the third stage, the X(0) row is mixed with the X 
(3) row, the X(1) row is mixed with the X(5) row, the X 
(2) row is mixed with the X(6) row, the X(3) row is mixed 

35 with the X(7) row, the X(8) row is mixed with the X(12) 
row, the X(9) row is mixed with the X(13) row, the X(10) 
row is mixed with the X(14) row and the X(11) row is 
mixed with the X(15) row. 

Finally, in the last stage, m=4, the X(0) row is mixed 

40 with the X(8) row, the X(1) row is mixed with the X(9) 
row, the X(2) row is mixed with the X(10) row, the X(3) 
row is mixed with the X(11) row, the X(4) row is mixed 
with the X(1 2) row, the X(5) row is mixed with the X(1 3) 
row, the X(6) row is mixed with the X(14) row, and the 

45 X(7) row is mixed with the X(15) row. 

Thus, in the illustrated example, in which 16 sam- 
ples are taken per segment of the time domain signal, 
the butterfly operation is performed 8 times in each 
stage, and there are 4 stages. Accordingly, the butterfly 

50 operation is performed 8 x 4 = 32 times. 

In most real life situations, the number of samples 
will be substantially greater than 16. For example, seg- 
ments comprising 256, 51 2, and 1 024 samples are com- 
monly used. 

55 Figure 7 is a block diagram of a processor architec- 
ture in accordance with the present invention employing 
dual parallel multiply and accumulate units (MACs) with 
a crossover connection between the two MACs which 
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allows the FFT butterfly operation to be performed in on- 
ly two instruction cycles. 

The architecture comprises two parallel and cross 
connected MACs A and B. The two parallel MACs are 
essentially identical in structure. MAC A comprises data 
registers 12 and 14 for receiving data from a memory 
100 through memory bus 80. MAC B comprises two 
identical registers 16 and 18. The inputs to all of the reg- 
isters 1 2, 1 4, 1 6, 1 8 are coupled to the memory bus 80. 
The output of register 12 is coupled to the first input of 
multiplier unit 22, while the output of register 14 is cou- 
pled to the second input of multiplier unit 22. A similar 
arrangement exists in the second MAC unit, with the out- 
puts of registers 16 and 18 coupled to first and second 
inputs, respectively, of the multiplier unit 24. The outputs 
of the two multiplier units 22 and 24 are coupled to prod- 
uct accumulators 26 and 28, respectively. The product 
accumulators are followed by arithmetic logic units 
(ALUs) 30 and 32, respectively. The outputs of the two 
ALUs are both coupled to a common accumulator reg- 
ister file 34. 

ALU 30 will be described herein in detail, it being 
understood that, in the preferred embodiment of the in- 
vention, ALU 32 is identical to ALU 30, except as other- 
wise noted. ALU 30 has three input terminals, SO, S1 
and S2. It also has a multiplexer for selecting one of 
three input source paths to input S1. Input terminal SO 
is coupled to the output of the accumulator register file 
34 in order to provide wrap-around arithmetic operations 
from instruction cycle to instruction cycle. Terminal S1 
of ALU 30 is coupled to the output of the corresponding 
product accumulator 26 of MAC A. Terminal S2 of ALU 
30 is coupled to the product accumulator 28 of the par- 
allel MAC B. 

Terminal S1 also is coupled to receive signals from 
the output of accumulator register file 34 as well as the 
accumulator register 16. However, with respect to the 
present invention, the only relevant input source to ter- 
minal S1 of ALU 30 is the output of product accumulator 
26. The other connections are provided in order to make 
the processor architecture a general purpose architec- 
ture so as to be useable for a wide variety of other math- 
ematical, logical and other operations. 

MAC B is structurally identical to MAC A. Input ter- 
minal S2 of ALU 32 is coupled to the output of the prod- 
uct accumulator 26 of MAC A and input terminal S1 of 
ALU 32 is coupled to receive the output of the product 
accumulator 28 of MAC B. 

In this dual MAC with crossover connection archi- 
tecture, the canonical FFT butterfly operation can be 
performed in only two cycles. Particularly, referring to 
equations 7 and 8, which are reproduced below again 
for ease of reference, the entire result for the real part 
of equations 7 and 8 can be calculated in one cycle 
since, as mentioned above, the two multiplications re- 
quired in the real portion of equation 7 are the same as 
the two multiplications in the real portion of equation 8. 



X (m+1 ) =b 1 +b 0 b 2 -c 0 c 2 +j (c 1 +b Q c 2 +c 0 b 2 ) (Eq. 7) 



5 Y (m+1 ) =b 1 -b 0 b 2 +c 0 c 2 +j (c 1 -b 0 c 2 -c 0 b 2 ) (Eq. 8) 

Accordingly, if the products b 0 b 2 and c 0 c 2 can be calcu- 
lated simultaneously and then simultaneously added 
and subtracted from b v half of the butterfly operation 

10 can be performed in one instruction cycle. The architec- 
ture illustrated in Figure 7 provides for such a possibility. 

Particularly, accumulator 14 can be supplied with 
value bo from memory 1 00 while accumulator 1 6 is sup- 
plied with value b 2 from memory 100. At the same time, 

75 accumulator 1 8 is supplied with value c 2 and accumu- 
lator 20 is supplied with the value c 0 . Multiplier 22 cal- 
culates and outputs the product b 0 b 2 while multiplier 24 
simultaneously calculates and outputs the product c 0 c 2 . 
The output b 0 b 2 is passed through product accumulator 

20 26 to input terminal S1 of ALU 30 as well as to input 
terminal S2 input of ALU 32. Likewise, the product CqC 2 
is passed through product accumulator 28 through the 
S1 input of ALU 32 as well as to the S2 input of ALU 30. 
The value of b A is supplied from the accumulator register 

25 file 34 to the SO input of both ALU 30 and ALU 32. ALU 
30 is capable of adding all three values at its three in- 
puts. It will be understood by those skilled in the art that 
the terms "add" and "sum" and variations thereof as 
used herein and in the processor field in general encom- 

30 passes both addition and subtraction. Accordingly, ALU 
30 can calculate b 1 +b 0 b 2 -c 0 C2, while ALU 32 is simulta- 
neously computing b r b 0 b 2 +c 0 c 2 . Accordingly, the en- 
tire real parts of the solutions for X(m+1 ) and Y(m+1 ) in 
equations 7 and 8 are calculated simultaneously in the 

35 MAC in a single cycle. In the next cycle, the same op- 
eration can be performed with respect to the imaginary 
parts of X(m+1) and Y(m+1). Accordingly, with this ar- 
chitecture, the entire FFT butterfly operation can be per- 
formed in two instruction cycles. 

40 However, the architecture is a general processor ar- 
chitecture that can perform as wide a variety of mathe- 
matical and logical operations as any other general pur- 
pose processor architecture. 

When the full FFT algorithm, including data loading 

45 cycles, is considered, the FFT butterfly operation can 
be performed in 4 cycles. With dual parallel MACs with- 
out crossover, the butterfly operation would require at 
least six cycles; a net penalty of 50%. 

A statistical analysis has been performed in order 

50 to calculate the overall savings in instruction cycles 
achieved by the present invention for a practical com- 
plete FFT algorithm. The architecture of the present in- 
vention performs an overall FFT operation in as little as 
37% of the time that would be necessary for an archi- 

55 tecture including two parallel processors operating si- 
multaneously, but without crossover. 

Having thus described a few particular embodi- 
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ments of the invention, various alterations, modifica- 
tions, and improvements will readily occur to those 
skilled in the art. 



Claims 

1 . A processor comprising: 

first and second multipliers, each having first 
and second inputs for receiving digital signals 
and an output that is the product of signals ap- 
plied at said first and second inputs; 

first and second adders, each having first, sec- 
ond and third inputs for receiving digital signals 
and an output that is the sum of at least two of 
said signals at said inputs of said adders; 

said output of said first multiplier coupled to 
said first input of said first adder and to said sec- 
ond input of said second adder; and 

said output of said second multiplier coupled to 
said first input of said second adder and said 
second input of said first adder. 

2. A processor as set forth in claim 1 further compris- 
ing: 

an accumulator having first and second inputs 
coupled to said outputs of said first and second 
adders, respectively, and an output coupled to said 
third inputs of said first and second adders, respec- 
tively. 

3. A processor as set forth in claim 2 further compris- 
ing: 

a memory coupled to said first and second in- 
puts of said first and second multipliers, respective- 
ly, for supplying to said multipliers said digital sig- 
nals from which said first and second products are 
to be calculated. 

4. A processor as set forth in claim 3 wherein: 

said output of said accumulator is further cou- 
pled to said memory. 

5. A processor as set forth in claim 4 wherein said 
memory comprises a first memory coupled to said 
first and second registers and a second memory 
coupled to said third and fourth registers. 

6. A processor as set forth in claim 5 wherein said first 
and second adders each comprise an arithmetic 
logic unit. 

7. A processor as set forth in claim 6 further compris- 
ing: 



a first register coupled between said memory 
and said first input of said first multiplier; 

a second register coupled between said mem- 
5 ory and said second input of said first multiplier; 

a third register coupled between said memory 
and said first input of said second multiplier; 
and 

10 

a fourth register coupled between said memory 
and said second input of said second multiplier. 

8. A processor as set forth in claim 7 further compris- 
es ing: 

a fifth register having an input coupled to said 
output of said first multiplier and an output cou- 
pled to said first input of said first adder and said 
20 second input of said second adder; and 

a sixth register having an input coupled to said 
output of said second multiplier and an output 
coupled to said second input of said first adder 
25 and said first input of said second adder. 

9. A method for performing in a digital processing ap- 
paratus a calculation of; 

30 

X=b 1+ b 0 b 2 -c 0 c 2 , 
Y=b r b 0 b 2 +c 0 c 2 , 

35 

said method comprising the steps of: 

(1) simultaneously multiplying in first and sec- 
ond parallel multipliers b 0 b 2 and c 0 c 2 , respec- 

40 tively; 

(2) simultaneously calculating in first and sec- 
ond parallel adders b 1 +b 0 b 2 -c 0 c 2 and b r 
b 0 b 2 +c 0 c 2 , 

45 

respectively. 

10. A method as set forth in claim 9 wherein step (2) 
comprises; 

so 

(2.1) providing b 0 b 2 from said first multiplier to 
a first input terminal of said first adder and to a 
second input terminal of said second adder; 

ss (2.2) providing c 0 c 2 from said second multiplier 

to a second input terminal of said first adder and 
to a first input terminal of said second adder; 
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(2.3) providing b 1 to a third input terminal of 
each of said first and second adders; and 

(2.4) each of said adders separately adding 
said values applied to their input terminals in s 
accordance with said equations b 1 +b 0 b 2 -c 0 c 2 
and b-f-b o b2+c 0 c 2 respectively. 

11. A method for performing in a digital processing ap- 
paratus a calculation of; 10 

X (m+1 ) = b 1 +b 0 b 2 -c 0 c 2 +j (c 1 +b 0 c 2 +c 0 b 2 ), 



Y (m+1)=b r b 0 b 2 +c 0 c 2 +j (c r b 0 c 2 -c 0 b 2 ), 



said method comprising calculating the real parts of 
X(m+1 ) and Y(m+1 ) by a method as set out in claim 
9 or claim 1 0 and calculating the imaginary parts of 20 
X(m+1 ) and Y(m+1 ) by a method as set out in claim 
9 or claim 10. 

1 2. A method as set out in claim 1 1 wherein said calcu- 
lation is a butterfly operation of a fast Fourier Trans- 25 
form. 

1 3. A processor for performing fast Fourier Transforms 
comprising a processor as claimed in any of claims 

1 to 8. 30 

14. A computer having a central processing unit, said 
central processing unit comprising a processor as 
claimed in any of claims 1 to B. 



40 



45 



50 



55 



7 



EP0 889 416 A2 




8 



EP0 889 416 A2 



FIG. 2 



l - 



-1- 



cu 0 



uu 



FIG. 3 




AMPLITUDE 



X(m) O. 





FIG. 


A 


A- 






A/2- 














FIG. 


5 


) 


1 o 


1 



Li 



Y (mi 0_ 



H [N. k) 




0 X(m+1) 



O Y|m+1) 



EP0 889 416 A2 




10 



EP 0 889 416 A2 



r- 



FIG. 7 



MEMORY SUBSYSTEM 



100 



MEMORY 
BANK-A 



MEMORY 
POINTER 
UNIT 



□ 



12 — - j XA | YA~j 14 

16 1 16[ ' 

22 




% 
I 



26— r~pri 



ALU* 



30 



7 



7 



34- 



MEMORY 
BANK-B 



16- 



80 



. CT_.._h^_.._-^rr:.j j 



YB 1 XB | — 18 




,16 116 



PB — 28 



sTvsT 



'32 



v^so 

ALUB/BSU 



3? 



J 



1 



ACCUMULATOR 
REGISTER FILE 

1 



11 



